[Gluster-users] geo-replication not syncing files...
Dietmar Putz
putz at 3qmedien.net
Thu Nov 12 17:11:37 UTC 2015
some new findings...
yesterday evening i saved a 'find /gluster-export -type f -exec ls
-lisa' on all masternodes to look for stale filelink's.
afterwards the geo-replication restarted several times and the latest
restart was caused by (entire sequence below) :
[ 16:06:45 ] - root at gluster-ger-ber-09
/var/log/glusterfs/geo-replication/ger-ber-01 $cat
*gluster-export.gluster.log | egrep "No such file|stale linkfile" | tail -1
[2015-11-12 15:58:19.491418] I
[dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup of
/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg on
ger-ber-01-replicate-0 (following linkfile) failed (No such file or
directory)
[ 16:07:18 ] - root at gluster-ger-ber-09
/var/log/glusterfs/geo-replication/ger-ber-01 $
In the output of yesterday evening the mentioned linkfile exists on
ger-ber-09 and ger-ber-10 (vol ger-ber-01-replicate-1), not on
ger-ber-01-replicate-0.
root at dp-server:/home/tron/gluster# grep
/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
ls-lisa-gluster-export-files-gluster-ger-ber-*
ls-lisa-gluster-export-files-gluster-ger-ber-09-int.out:86082371834 0
---------T 2 root root 0 Apr 18 2014
/gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
ls-lisa-gluster-export-files-gluster-ger-ber-10-int.out:86085383674 0
---------T 2 root root 0 Apr 18 2014
/gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
root at dp-server:/home/tron/gluster#
root at dp-server:/home/tron/gluster# grep 86082371834
ls-lisa-gluster-export-files-gluster-ger-ber-09-int.out
86082371834 0 ---------T 2 root root 0 Apr 18 2014
/gluster-export/.glusterfs/61/de/61def2a4-4a6d-4979-9d54-14b041e926a7
86082371834 0 ---------T 2 root root 0 Apr 18 2014
/gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
root at dp-server:/home/tron/gluster# grep 86085383674
ls-lisa-gluster-export-files-gluster-ger-ber-10-int.out
86085383674 0 ---------T 2 root root 0 Apr 18 2014
/gluster-export/.glusterfs/61/de/61def2a4-4a6d-4979-9d54-14b041e926a7
86085383674 0 ---------T 2 root root 0 Apr 18 2014
/gluster-export/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
root at dp-server:/home/tron/gluster#
After restart the linkfile/gfid does not exist anymore...apparently some
consistency check is running in the backgroud and geo-repl process is
dieing when such linkfile problem occurs...
should i have done a 'rebalance' on that volume before starting
geo-replication ?
currently I'm running a script which find those linkfiles in the saved
file-listings.
i think i have to delete them manually and observe how the
geo-replication reacts when it was running for some hours without
frequent restarts...
best regards
dietmar
[2015-11-12 13:45:14.295292] I [fuse-bridge.c:5520:fini] 0-fuse:
Unmounting '/tmp/gsyncd-aux-mount-M_LLid'.
[2015-11-12 15:58:19.491418] I
[dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup of
/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg on
ger-ber-01-replicate-0 (following linkfile) failed (No such file or
directory)
[2015-11-12 15:58:19.493244] I
[dht-common.c:911:dht_lookup_unlink_stale_linkto_cbk] 0-ger-ber-01-dht:
Returned with op_ret 0 and op_errno 0 for
/cache/img/img/articles/5087ab4c8e57b6.01164210.jpeg
[2015-11-12 15:58:31.409954] I [glusterfsd.c:1959:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.6
(/usr/sbin/glusterfs --aux-gfid-mount
--log-file=/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%
4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.%2Fgluster-export.gluster.log
--volfile-server=localhost --volfile-id=ger-ber-01 --client-pid=-1
/tmp/gsyncd-aux-mount-iDrtTp)
[2015-11-12 15:58:31.431290] I [socket.c:3645:socket_init] 0-glusterfs:
SSL support is NOT enabled
[2015-11-12 15:58:31.431321] I [socket.c:3660:socket_init] 0-glusterfs:
using system polling thread
[2015-11-12 15:58:31.495912] I [dht-shared.c:314:dht_init_regex]
0-ger-ber-01-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2015-11-12 15:58:31.497671] I [socket.c:3645:socket_init]
0-ger-ber-01-client-5: SSL support is NOT enabled
[2015-11-12 15:58:31.497694] I [socket.c:3660:socket_init]
0-ger-ber-01-client-5: using system polling thread
[2015-11-12 15:58:31.497815] I [socket.c:3645:socket_init]
0-ger-ber-01-client-4: SSL support is NOT enabled
[2015-11-12 15:58:31.497820] I [socket.c:3660:socket_init]
0-ger-ber-01-client-4: using system polling thread
[2015-11-12 15:58:31.497939] I [socket.c:3645:socket_init]
0-ger-ber-01-client-3: SSL support is NOT enabled
[2015-11-12 15:58:31.497945] I [socket.c:3660:socket_init]
0-ger-ber-01-client-3: using system polling thread
[2015-11-12 15:58:31.498059] I [socket.c:3645:socket_init]
0-ger-ber-01-client-2: SSL support is NOT enabled
[2015-11-12 15:58:31.498063] I [socket.c:3660:socket_init]
0-ger-ber-01-client-2: using system polling thread
[2015-11-12 15:58:31.498184] I [socket.c:3645:socket_init]
0-ger-ber-01-client-1: SSL support is NOT enabled
[2015-11-12 15:58:31.498189] I [socket.c:3660:socket_init]
0-ger-ber-01-client-1: using system polling thread
[2015-11-12 15:58:31.498298] I [socket.c:3645:socket_init]
0-ger-ber-01-client-0: SSL support is NOT enabled
[2015-11-12 15:58:31.498303] I [socket.c:3660:socket_init]
0-ger-ber-01-client-0: using system polling thread
[2015-11-12 15:58:31.498325] I [client.c:2294:notify]
0-ger-ber-01-client-0: parent translators are ready, attempting connect
on transport
[2015-11-12 15:58:31.498603] I [client.c:2294:notify]
0-ger-ber-01-client-1: parent translators are ready, attempting connect
on transport
[2015-11-12 15:58:31.498738] I [client.c:2294:notify]
0-ger-ber-01-client-2: parent translators are ready, attempting connect
on transport
[2015-11-12 15:58:31.498904] I [client.c:2294:notify]
0-ger-ber-01-client-3: parent translators are ready, attempting connect
on transport
[2015-11-12 15:58:31.499064] I [client.c:2294:notify]
0-ger-ber-01-client-4: parent translators are ready, attempting connect
on transport
[2015-11-12 15:58:31.499192] I [client.c:2294:notify]
0-ger-ber-01-client-5: parent translators are ready, attempting connect
on transport
Final graph:
+------------------------------------------------------------------------------+
1: volume ger-ber-01-client-0
2: type protocol/client
3: option remote-host gluster-ger-ber-11-int
4: option remote-subvolume /gluster-export
5: option transport-type socket
6: option username f2720838-abbd-4629-bd24-8040ceb794c1
7: option password 5ee70f0b-e537-4d11-8692-863864e227e4
8: option ping-timeout 10
9: option send-gids true
10: end-volume
11:
12: volume ger-ber-01-client-1
13: type protocol/client
14: option remote-host gluster-ger-ber-12-int
15: option remote-subvolume /gluster-export
16: option transport-type socket
17: option username f2720838-abbd-4629-bd24-8040ceb794c1
18: option password 5ee70f0b-e537-4d11-8692-863864e227e4
19: option ping-timeout 10
20: option send-gids true
21: end-volume
22:
23: volume ger-ber-01-replicate-0
24: type cluster/replicate
25: subvolumes ger-ber-01-client-0 ger-ber-01-client-1
26: end-volume
27:
28: volume ger-ber-01-client-2
29: type protocol/client
30: option remote-host gluster-ger-ber-09-int
31: option remote-subvolume /gluster-export
32: option transport-type socket
33: option username f2720838-abbd-4629-bd24-8040ceb794c1
34: option password 5ee70f0b-e537-4d11-8692-863864e227e4
35: option ping-timeout 10
36: option send-gids true
37: end-volume
38:
39: volume ger-ber-01-client-3
40: type protocol/client
41: option remote-host gluster-ger-ber-10-int
42: option remote-subvolume /gluster-export
43: option transport-type socket
44: option username f2720838-abbd-4629-bd24-8040ceb794c1
45: option password 5ee70f0b-e537-4d11-8692-863864e227e4
46: option ping-timeout 10
47: option send-gids true
48: end-volume
49:
50: volume ger-ber-01-replicate-1
51: type cluster/replicate
52: subvolumes ger-ber-01-client-2 ger-ber-01-client-3
53: end-volume
54:
55: volume ger-ber-01-client-4
56: type protocol/client
57: option remote-host gluster-ger-ber-07-int
58: option remote-subvolume /gluster-export
59: option transport-type socket
60: option username f2720838-abbd-4629-bd24-8040ceb794c1
61: option password 5ee70f0b-e537-4d11-8692-863864e227e4
62: option ping-timeout 10
63: option send-gids true
64: end-volume
65:
66: volume ger-ber-01-client-5
67: type protocol/client
68: option remote-host gluster-ger-ber-08-int
69: option remote-subvolume /gluster-export
70: option transport-type socket
71: option username f2720838-abbd-4629-bd24-8040ceb794c1
72: option password 5ee70f0b-e537-4d11-8692-863864e227e4
73: option ping-timeout 10
74: option send-gids true
75: end-volume
76:
77: volume ger-ber-01-replicate-2
78: type cluster/replicate
79: subvolumes ger-ber-01-client-4 ger-ber-01-client-5
80: end-volume
81:
82: volume ger-ber-01-dht
83: type cluster/distribute
84: option min-free-disk 200GB
85: subvolumes ger-ber-01-replicate-0 ger-ber-01-replicate-1
ger-ber-01-replicate-2
86: end-volume
87:
88: volume ger-ber-01-write-behind
89: type performance/write-behind
90: subvolumes ger-ber-01-dht
91: end-volume
92:
93: volume ger-ber-01-read-ahead
94: type performance/read-ahead
95: subvolumes ger-ber-01-write-behind
96: end-volume
97:
98: volume ger-ber-01-io-cache
99: type performance/io-cache
100: option min-file-size 0
101: option cache-timeout 2
102: option cache-size 1024MB
103: subvolumes ger-ber-01-read-ahead
104: end-volume
105:
106: volume ger-ber-01-quick-read
107: type performance/quick-read
108: option cache-size 1024MB
109: subvolumes ger-ber-01-io-cache
110: end-volume
111:
112: volume ger-ber-01-md-cache
113: type performance/md-cache
114: subvolumes ger-ber-01-quick-read
115: end-volume
116:
117: volume ger-ber-01
118: type debug/io-stats
119: option latency-measurement off
120: option count-fop-hits off
121: subvolumes ger-ber-01-md-cache
122: end-volume
123:
124: volume gfid-access-autoload
125: type features/gfid-access
126: subvolumes ger-ber-01
127: end-volume
128:
+------------------------------------------------------------------------------+
[2015-11-12 15:58:31.499624] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-ger-ber-01-client-2: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500026] I
[client-handshake.c:1677:select_server_supported_programs]
0-ger-ber-01-client-2: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-11-12 15:58:31.500138] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-ger-ber-01-client-4: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500160] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-ger-ber-01-client-5: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500565] I
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-2:
Connected to 10.0.1.101:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.500576] I
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-2:
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.500611] I [afr-common.c:4487:afr_notify]
0-ger-ber-01-replicate-1: Subvolume 'ger-ber-01-client-2' came back up;
going online.
[2015-11-12 15:58:31.500629] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-ger-ber-01-client-3: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500659] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-ger-ber-01-client-0: changing port to 49152 (from 0)
[2015-11-12 15:58:31.500673] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
0-ger-ber-01-client-1: changing port to 49152 (from 0)
[2015-11-12 15:58:31.501206] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-ger-ber-01-client-2: Server lk version = 1
[2015-11-12 15:58:31.501239] I
[client-handshake.c:1677:select_server_supported_programs]
0-ger-ber-01-client-4: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-11-12 15:58:31.501380] I
[client-handshake.c:1677:select_server_supported_programs]
0-ger-ber-01-client-5: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-11-12 15:58:31.501479] I
[client-handshake.c:1677:select_server_supported_programs]
0-ger-ber-01-client-0: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-11-12 15:58:31.501540] I
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-4:
Connected to 10.0.1.147:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.501546] I
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-4:
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.501566] I [afr-common.c:4487:afr_notify]
0-ger-ber-01-replicate-2: Subvolume 'ger-ber-01-client-4' came back up;
going online.
[2015-11-12 15:58:31.501576] I
[client-handshake.c:1677:select_server_supported_programs]
0-ger-ber-01-client-3: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-11-12 15:58:31.501649] I
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-5:
Connected to 10.0.1.149:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.501654] I
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-5:
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.501729] I
[client-handshake.c:1677:select_server_supported_programs]
0-ger-ber-01-client-1: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-11-12 15:58:31.501753] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-ger-ber-01-client-4: Server lk version = 1
[2015-11-12 15:58:31.501778] I
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-0:
Connected to 10.0.1.105:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.501783] I
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-0:
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.501797] I [afr-common.c:4487:afr_notify]
0-ger-ber-01-replicate-0: Subvolume 'ger-ber-01-client-0' came back up;
going online.
[2015-11-12 15:58:31.501820] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-ger-ber-01-client-5: Server lk version = 1
[2015-11-12 15:58:31.501934] I
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-3:
Connected to 10.0.1.103:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.501939] I
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-3:
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.501982] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-ger-ber-01-client-0: Server lk version = 1
[2015-11-12 15:58:31.502061] I
[client-handshake.c:1462:client_setvolume_cbk] 0-ger-ber-01-client-1:
Connected to 10.0.1.107:49152, attached to remote volume '/gluster-export'.
[2015-11-12 15:58:31.502067] I
[client-handshake.c:1474:client_setvolume_cbk] 0-ger-ber-01-client-1:
Server and Client lk-version numbers are not same, reopening the fds
[2015-11-12 15:58:31.504554] I [fuse-bridge.c:5022:fuse_graph_setup]
0-fuse: switched to graph 0
[2015-11-12 15:58:31.504588] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-ger-ber-01-client-1: Server lk version = 1
[2015-11-12 15:58:31.504595] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-ger-ber-01-client-3: Server lk version = 1
[2015-11-12 15:58:31.504657] I [fuse-bridge.c:3959:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22
kernel 7.22
[ 16:13:28 ] - root at gluster-ger-ber-09
/var/log/glusterfs/geo-replication/ger-ber-01 $
Am 12.11.2015 um 13:23 schrieb Dietmar Putz:
> Hello Aravinda,
>
> thank you for replay...
> answers inside...
>
> best regards
> dietmar
>
> Am 12.11.2015 um 07:22 schrieb Aravinda:
>> Please paste some lines from log files where you see "Operation not
>> permitted" errors.
>
> the appearance of these "Operation not permitted" warnings on the
> active slave are so frequent that there are no corresponding message
> on the master...
> what i observe is that 1270 different gfid's are processed again and
> again...
>
> [2015-11-12 10:53:55.355550] W [fuse-bridge.c:1942:fuse_create_cbk]
> 0-glusterfs-fuse: 29348: /.gfid/f5f57c10-381d-4537-8988-f22e16fab523
> => -1 (Operation not permitted)
> [ 10:56:59 ] - root at gluster-wien-05
> /var/log/glusterfs/geo-replication-slaves $cat *.gluster.log
> *.gluster.log.1 | grep "Operation not permitted" | wc -l
> 54756
> [ 11:02:28 ] - root at gluster-wien-05
> /var/log/glusterfs/geo-replication-slaves $cat *.gluster.log
> *.gluster.log.1 | grep "Operation not permitted" | awk '{print $7}' |
> head -2
> /.gfid/f857ac77-6afc-494f-88c7-304aec6cc8b5
> /.gfid/f87e0361-3915-495f-bdb2-0c87155f4ddd
> [ 11:02:48 ] - root at gluster-wien-05
> /var/log/glusterfs/geo-replication-slaves $cat *.gluster.log
> *.gluster.log.1 | grep "Operation not permitted" | awk '{print $7}' |
> sort | uniq | wc -l
> 1270
> [ 11:03:14 ] - root at gluster-wien-05
> /var/log/glusterfs/geo-replication-slaves $
>
> these gfid's are filling up the geo-repl. logfile on the master after
> each restart...
> according to the master_gfid_file.txt the files behind the gfid's are
> available on master and slave.
>
>
> i observe many geo restarts on the master node and as far as i can see
> they are caused by two different reasons ..."No such file or
> directory" and "attempting deletion of stale linkfile"
>
> [ 11:30:08 ] - root at gluster-ger-ber-09
> /var/log/glusterfs/geo-replication/ger-ber-01 $cat
> *gluster-export.gluster.log | grep "No such file or directory" | tail -1
>
> [2015-11-12 09:26:35.942478] I
> [dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup
> of /cache/img/img/articles/508e370a62a2e4.87470810.jpeg on
> ger-ber-01-replicate-0 (following linkfile) failed (No such file or
> directory)
>
> [ 11:30:31 ] - root at gluster-ger-ber-09
> /var/log/glusterfs/geo-replication/ger-ber-01 $cat
> *gluster-export.gluster.log | grep "deletion of stale linkfile" | tail -1
>
> [2015-11-11 23:08:44.487825] I
> [dht-common.c:1480:dht_lookup_everywhere_cbk] 0-ger-ber-01-dht:
> attempting deletion of stale linkfile
> /cache/img/download/508813939afa33.63956784.pdf.gz on
> ger-ber-01-replicate-1
>
> [ 11:31:10 ] - root at gluster-ger-ber-09
> /var/log/glusterfs/geo-replication/ger-ber-01 $cat
> *gluster-export.gluster.log | egrep -i "no such file or
> directory|deletion of stale linkfile" | wc -l
> 115
>
> any time such an error appears the geo-replication prozess
> restarts...meanwhile 115 times since 8th nov.
>
>
> "attempting to delete stale linkfile"
> example :
> /cache/img/articles/50856893226ce0.55808655.jpeg.gz
>
> [2015-11-10 23:48:14.273733] W [glusterfsd.c:1095:cleanup_and_exit]
> (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fbf8be6c47d]
> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7fbf8c13f182]
> (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7fbf8cc39265])))
> 0-: received signum (15), shutting down
> [2015-11-10 23:48:14.273768] I [fuse-bridge.c:5520:fini] 0-fuse:
> Unmounting '/tmp/gsyncd-aux-mount-QLm2DI'.
> [2015-11-11 01:14:26.972164] I
> [dht-common.c:1480:dht_lookup_everywhere_cbk] 0-ger-ber-01-dht:
> attempting deletion of stale linkfile
> /cache/img/articles/50856893226ce0.55808655.jpeg.gz on
> ger-ber-01-replicate-1
> [2015-11-11 01:14:26.999234] I
> [dht-common.c:825:dht_lookup_unlink_cbk] 0-ger-ber-01-dht:
> lookup_unlink returned with op_ret -> 0 and op-errno -> 0 for
> /cache/img/articles/50856893226ce0.55808655.jpeg.gz
> [2015-11-11 01:14:39.060592] I [glusterfsd.c:1959:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
> 3.5.6 (/usr/sbin/glusterfs --aux-gfid-mount
> --log-file=/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.%2Fgluster-export.gluster.log
> --volfile-server=localhost --volfile-id=ger-ber-01 --client-pid=-1
> /tmp/gsyncd-aux-mount-zXLm3a)
> [2015-11-11 01:14:39.093942] I [socket.c:3645:socket_init]
> 0-glusterfs: SSL support is NOT enabled
> [2015-11-11 01:14:39.093969] I [socket.c:3660:socket_init]
> 0-glusterfs: using system polling thread
> [2015-11-11 01:14:39.171506] I [dht-shared.c:314:dht_init_regex]
> 0-ger-ber-01-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
> [2015-11-11 01:14:39.173291] I [socket.c:3645:socket_init]
> 0-ger-ber-01-client-5: SSL support is NOT enabled
> [2015-11-11 01:14:39.173304] I [socket.c:3660:socket_init]
> 0-ger-ber-01-client-5: using system polling thread
> [2015-11-11 01:14:39.173426] I [socket.c:3645:socket_init]
> 0-ger-ber-01-client-4: SSL support is NOT enabled
> [2015-11-11 01:14:39.173430] I [socket.c:3660:socket_init]
> 0-ger-ber-01-client-4: using system polling thread
> [2015-11-11 01:14:39.173545] I [socket.c:3645:socket_init]
> 0-ger-ber-01-client-3: SSL support is NOT enabled
> [2015-11-11 01:14:39.173550] I [socket.c:3660:socket_init]
> 0-ger-ber-01-client-3: using system polling thread
> [2015-11-11 01:14:39.173667] I [socket.c:3645:socket_init]
> 0-ger-ber-01-client-2: SSL support is NOT enabled
> [2015-11-11 01:14:39.173671] I [socket.c:3660:socket_init]
> 0-ger-ber-01-client-2: using system polling thread
> [2015-11-11 01:14:39.173795] I [socket.c:3645:socket_init]
> 0-ger-ber-01-client-1: SSL support is NOT enabled
> [2015-11-11 01:14:39.173800] I [socket.c:3660:socket_init]
> 0-ger-ber-01-client-1: using system polling thread
> [2015-11-11 01:14:39.173916] I [socket.c:3645:socket_init]
> 0-ger-ber-01-client-0: SSL support is NOT enabled
> [2015-11-11 01:14:39.173921] I [socket.c:3660:socket_init]
> 0-ger-ber-01-client-0: using system polling thread
> [2015-11-11 01:14:39.173944] I [client.c:2294:notify]
> 0-ger-ber-01-client-0: parent translators are ready, attempting
> connect on transport
> [2015-11-11 01:14:39.174163] I [client.c:2294:notify]
> 0-ger-ber-01-client-1: parent translators are ready, attempting
> connect on transport
> [2015-11-11 01:14:39.174299] I [client.c:2294:notify]
> 0-ger-ber-01-client-2: parent translators are ready, attempting
> connect on transport
> [2015-11-11 01:14:39.174474] I [client.c:2294:notify]
> 0-ger-ber-01-client-3: parent translators are ready, attempting
> connect on transport
> [2015-11-11 01:14:39.174620] I [client.c:2294:notify]
> 0-ger-ber-01-client-4: parent translators are ready, attempting
> connect on transport
> [2015-11-11 01:14:39.174752] I [client.c:2294:notify]
> 0-ger-ber-01-client-5: parent translators are ready, attempting
> connect on transport
>
>
> all of the stale linkfiles were stored on ger-ber-01-replicate-1 which
> is gluster-ger-ber-09-int and gluster-ger-ber-10-int. obviously the
> gluster successfully deletes the stale linkfile because afterwards i
> can not find the mentioned file anymore but the geo-replication
> process restarts and does not process the same linkfile again.
> looking around i found some more of the stale linkfiles...size 0 and
> no corresponding file.
> so i come more and more to the conclusion that i possibly have a
> preceding problem...
>
> can i delete such stale linkfiles without any problems ...?
>
> putz at sdn-de-gate-01:~/central/gluster-3.5$ ../mycommand.sh -H
> gluster-ger -c "ls -l /gluster-export/cache/img/articles/*.gz"
> Host : gluster-ger-ber-07-int
> Return Code : 2 check result on gluster-ger-ber-07-int
> -----------------------------------------------------
> Host : gluster-ger-ber-08-int
> Return Code : 2 check result on gluster-ger-ber-08-int
> -----------------------------------------------------
> Host : gluster-ger-ber-09-int
> ---------T 2 root root 0 Apr 23 2014
> /gluster-export/cache/img/articles/507c19b7e19af9.19424972.jpeg.gz
> ---------T 2 root root 0 Apr 23 2014
> /gluster-export/cache/img/articles/507d4e41edb6b0.81932229.jpeg.gz
> ---------T 2 root root 0 Apr 23 2014
> /gluster-export/cache/img/articles/507e941aed0c81.54017110.jpeg.gz
> ---------T 2 root root 0 Apr 23 2014
> /gluster-export/cache/img/articles/508569822e9278.45704411.jpeg.gz
> -----------------------------------------------------
> Host : gluster-ger-ber-10-int
> ---------T 2 root root 0 Apr 23 2014
> /gluster-export/cache/img/articles/507c19b7e19af9.19424972.jpeg.gz
> ---------T 2 root root 0 Apr 23 2014
> /gluster-export/cache/img/articles/507d4e41edb6b0.81932229.jpeg.gz
> ---------T 2 root root 0 Apr 23 2014
> /gluster-export/cache/img/articles/507e941aed0c81.54017110.jpeg.gz
> ---------T 2 root root 0 Apr 23 2014
> /gluster-export/cache/img/articles/508569822e9278.45704411.jpeg.gz
> -----------------------------------------------------
> Host : gluster-ger-ber-11-int
> Return Code : 2 check result on gluster-ger-ber-11-int
> -----------------------------------------------------
> Host : gluster-ger-ber-12-int
> Return Code : 2 check result on gluster-ger-ber-12-int
> -----------------------------------------------------
>
>
> The other reason for restart ... "No such file or directory"...still
> no clue...
>
> "No such file or directory"
>
> [2015-11-10 11:12:39.975390] I [fuse-bridge.c:5520:fini] 0-fuse:
> Unmounting '/tmp/gsyncd-aux-mount-JXonZH'.
> [2015-11-10 13:07:39.532569] I
> [dht-common.c:1572:dht_lookup_linkfile_cbk] 0-ger-ber-01-dht: lookup
> of /cache/img/articles/505c5538cd9d21.37413891.jpeg.gz on
> ger-ber-01-replicate-0 (following linkfile) failed (No such file or
> directory)
> [2015-11-10 13:07:39.563707] I
> [dht-common.c:911:dht_lookup_unlink_stale_linkto_cbk]
> 0-ger-ber-01-dht: Returned with op_ret 0 and op_errno 0 for
> /cache/img/articles/505c5538cd9d21.37413891.jpeg.gz
> [2015-11-10 13:07:51.483713] I [glusterfsd.c:1959:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
> 3.5.6 (/usr/sbin/glusterfs --aux-gfid-mount
> --log-file=/var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.%2Fgluster-export.gluster.log
> --volfile-server=localhost --volfile-id=ger-ber-01 --client-pid=-1
> /tmp/gsyncd-aux-mount-H9kZUh)
> [2015-11-10 13:07:51.511921] I [socket.c:3645:socket_init]
> 0-glusterfs: SSL support is NOT enabled
> [2015-11-10 13:07:51.511943] I [socket.c:3660:socket_init]
> 0-glusterfs: using system polling thread
> [2015-11-10 13:07:51.567950] I [dht-shared.c:314:dht_init_regex]
> 0-ger-ber-01-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
> [2015-11-10 13:07:51.570322] I [socket.c:3645:socket_init]
> 0-ger-ber-01-client-5: SSL support is NOT enabled
> [2015-11-10 13:07:51.570340] I [socket.c:3660:socket_init]
> 0-ger-ber-01-client-5: using system polling thread
> [2015-11-10 13:07:51.570497] I [socket.c:3645:socket_init]
> 0-ger-ber-01-client-4: SSL support is NOT enabled
> [2015-11-10 13:07:51.570504] I [socket.c:3660:socket_init]
> 0-ger-ber-01-client-4: using system polling thread
> [2015-11-10 13:07:51.570706] I [socket.c:3645:socket_init]
> 0-ger-ber-01-client-3: SSL support is NOT enabled
> [2015-11-10 13:07:51.570724] I [socket.c:3660:socket_init]
> 0-ger-ber-01-client-3: using system polling thread
> [2015-11-10 13:07:51.570888] I [socket.c:3645:socket_init]
> 0-ger-ber-01-client-2: SSL support is NOT enabled
> [2015-11-10 13:07:51.570895] I [socket.c:3660:socket_init]
> 0-ger-ber-01-client-2: using system polling thread
> [2015-11-10 13:07:51.571086] I [socket.c:3645:socket_init]
> 0-ger-ber-01-client-1: SSL support is NOT enabled
> [2015-11-10 13:07:51.571103] I [socket.c:3660:socket_init]
> 0-ger-ber-01-client-1: using system polling thread
> [2015-11-10 13:07:51.571259] I [socket.c:3645:socket_init]
> 0-ger-ber-01-client-0: SSL support is NOT enabled
> [2015-11-10 13:07:51.571266] I [socket.c:3660:socket_init]
> 0-ger-ber-01-client-0: using system polling thread
> [2015-11-10 13:07:51.571298] I [client.c:2294:notify]
> 0-ger-ber-01-client-0: parent translators are ready, attempting
> connect on transport
> [2015-11-10 13:07:51.571602] I [client.c:2294:notify]
> 0-ger-ber-01-client-1: parent translators are ready, attempting
> connect on transport
> [2015-11-10 13:07:51.571862] I [client.c:2294:notify]
> 0-ger-ber-01-client-2: parent translators are ready, attempting
> connect on transport
> [2015-11-10 13:07:51.572118] I [client.c:2294:notify]
> 0-ger-ber-01-client-3: parent translators are ready, attempting
> connect on transport
> [2015-11-10 13:07:51.572350] I [client.c:2294:notify]
> 0-ger-ber-01-client-4: parent translators are ready, attempting
> connect on transport
> [2015-11-10 13:07:51.572647] I [client.c:2294:notify]
> 0-ger-ber-01-client-5: parent translators are ready, attempting
> connect on transport
>
>
> any other logfiles helpful ?
>
>
>>
>> Were there any errors observed during step 3 as specified in
>> http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5
>
> step 3 of section "scheduling a downtime" was running without any errors.
> step 3 of section "upgrade steps for geo-replication" was started
> twice...after first attempt i recognized that the slave-upgrade.sh
> script uses the standard ssh port. after temporarily reconfiguration
> to port 22 of each slave ssh-deamon the script was running without any
> problems in a screen session and asked for each slave password...
>
>>
>> Some rows in status output are still in "Not Started", are there any
>> errors in /var/log/glusterfs/geo-replication/*.log in those nodes?
> yes, the other master nodes are not permitted to access the slaves
> nodes by iptables on the slave...just gluster-ger-ber-08-int and
> gluster-ger-ber-09-int are allowed to access each slave node.
> should that be adjusted to allow each master node to access each slave
> node...or does it 'just' influence the reliability of geo-replication
> and does not affect the geo-rep. problem ?
>
> [2015-11-09 12:45:40.30857] E [resource(monitor):204:errlog] Popen:
> command "ssh -p 2503 -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i
> /var/lib/glusterd/geo-replication/secret.pem gluster-wien-02 gluster
> --xml --remote-host=localhost volume info aut-wien-01" returned with
> 255, saying:
> [2015-11-09 12:45:40.61755] E [resource(monitor):207:logerr] Popen:
> ssh> ssh: connect to host gluster-wien-02 port 2503: Connection timed out
> [2015-11-09 12:45:40.62242] I [syncdutils(monitor):192:finalize]
> <top>: exiting.
>
>>
>> regards
>> Aravinda
>>
>> On 11/10/2015 10:50 PM, Dietmar Putz wrote:
>>> Hi all,
>>>
>>> i need some help with a geo-replication issue...
>>> recently i upgraded two 6-node distributed-replicated gluster from
>>> ubuntu 12.04.5 lts to 14.04.3 lts resp. glusterfs 3.4.7 to 3.5.6
>>> since then the geo-replication does not start syncing but remains as
>>> shown in the 'status detail' output below for about 48h.
>>>
>>> I followed the hints for upgrade with an existing geo-replication :
>>> http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5
>>>
>>> the master_gfid_file.txt was created and applied to the slave
>>> volume. geo-replication was started with 'force' option.
>>> in the gluster.log on the slave i can find thousands of lines with
>>> messages like :
>>> ".../.gfid/1abb953b-aa9d-4fa3-9a72-415204057572 => -1 (Operation not
>>> permitted)"
>>> and no files are synced.
>>>
>>> I'm not sure whats going on and since there are about 40TByte of
>>> data already replicated by the old 3.4.7 setup I have some fear to
>>> try around...
>>> so i got some questions...maybe somebody can give me some hints...
>>>
>>> 1. as shown in the example below the trusted.gfid of the same file
>>> differs in master and slave volume. as far as i understood the
>>> upgrade-howto after applying the master_gfid_file.txt on the slave
>>> they should be the same on master and slave...is that right ?
>>> 2. as shown in the config below the change_detector is 'xsync'.
>>> Somewhere i red that xsync is used for the initial replication and
>>> is changing to 'change_log' later on when the entire sync is done.
>>> should i try to modify the change_detector to 'change_log', does it
>>> make sense...?
>>>
>>> any other idea which could help me to solve this problem....?
>>>
>>> best regards
>>> dietmar
>>>
>>>
>>>
>>>
>>> [ 11:10:01 ] - root at gluster-ger-ber-09 ~ $glusterfs --version
>>> glusterfs 3.5.6 built on Sep 16 2015 15:27:30
>>> ...
>>> [ 11:11:37 ] - root at gluster-ger-ber-09 ~ $cat
>>> /var/lib/glusterd/glusterd.info | grep operating-version
>>> operating-version=30501
>>>
>>>
>>> [ 10:55:35 ] - root at gluster-ger-ber-09 ~ $gluster volume
>>> geo-replication ger-ber-01 ssh://gluster-wien-02::aut-wien-01 status
>>> detail
>>>
>>> MASTER NODE MASTER VOL MASTER BRICK
>>> SLAVE STATUS CHECKPOINT STATUS
>>> CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING
>>> DELETES PENDING FILES SKIPPED
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> gluster-ger-ber-09 ger-ber-01 /gluster-export
>>> gluster-wien-05-int::aut-wien-01 Active N/A Hybrid Crawl 0
>>> 8191 0 0 0
>>> gluster-ger-ber-11 ger-ber-01 /gluster-export
>>> ssh://gluster-wien-02::aut-wien-01 Not Started
>>> N/A N/A N/A N/A N/A N/A N/A
>>> gluster-ger-ber-10 ger-ber-01 /gluster-export
>>> ssh://gluster-wien-02::aut-wien-01 Not Started
>>> N/A N/A N/A N/A N/A N/A N/A
>>> gluster-ger-ber-12 ger-ber-01 /gluster-export
>>> ssh://gluster-wien-02::aut-wien-01 Not Started
>>> N/A N/A N/A N/A N/A N/A N/A
>>> gluster-ger-ber-07 ger-ber-01 /gluster-export
>>> ssh://gluster-wien-02::aut-wien-01 Not Started
>>> N/A N/A N/A N/A N/A N/A N/A
>>> gluster-ger-ber-08 ger-ber-01 /gluster-export
>>> gluster-wien-04-int::aut-wien-01 Passive N/A N/A 0
>>> 0 0 0 0
>>> [ 10:55:48 ] - root at gluster-ger-ber-09 ~ $
>>>
>>>
>>> [ 10:56:56 ] - root at gluster-ger-ber-09 ~ $gluster volume
>>> geo-replication ger-ber-01 ssh://gluster-wien-02::aut-wien-01 config
>>> special_sync_mode: partial
>>> state_socket_unencoded:
>>> /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.socket
>>> gluster_log_file:
>>> /var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.gluster.log
>>> ssh_command: ssh -p 2503 -oPasswordAuthentication=no
>>> -oStrictHostKeyChecking=no -i
>>> /var/lib/glusterd/geo-replication/secret.pem
>>> ignore_deletes: true
>>> change_detector: xsync
>>> ssh_command_tar: ssh -p 2503 -oPasswordAuthentication=no
>>> -oStrictHostKeyChecking=no -i
>>> /var/lib/glusterd/geo-replication/tar_ssh.pem
>>> working_dir:
>>> /var/run/gluster/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01
>>> remote_gsyncd: /nonexistent/gsyncd
>>> log_file:
>>> /var/log/glusterfs/geo-replication/ger-ber-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.log
>>> socketdir: /var/run
>>> state_file:
>>> /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.status
>>> state_detail_file:
>>> /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01-detail.status
>>> session_owner: 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671
>>> gluster_command_dir: /usr/sbin/
>>> pid_file:
>>> /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/ssh%3A%2F%2Froot%4082.199.131.2%3Agluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.pid
>>> georep_session_working_dir:
>>> /var/lib/glusterd/geo-replication/ger-ber-01_gluster-wien-02_aut-wien-01/
>>> gluster_params: aux-gfid-mount
>>> volume_id: 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671
>>> [ 11:10:01 ] - root at gluster-ger-ber-09 ~ $
>>>
>>>
>>>
>>> [ 12:45:34 ] - root at gluster-wien-05
>>> /var/log/glusterfs/geo-replication-slaves $tail -f
>>> 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671\:gluster%3A%2F%2F127.0.0.1%3Aaut-wien-01.gluster.log
>>> [2015-11-10 12:59:16.097932] W [fuse-bridge.c:1942:fuse_create_cbk]
>>> 0-glusterfs-fuse: 54267: /.gfid/1abb953b-aa9d-4fa3-9a72-415204057572
>>> => -1 (Operation not permitted)
>>> [2015-11-10 12:59:16.098044] W [defaults.c:1381:default_release]
>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.6/xlator/mount/fuse.so(+0xfb4d)
>>> [0x7fc9cd104b4d]
>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.6/xlator/mount/fuse.so(free_fuse_state+0x85)
>>> [0x7fc9cd0fab95]
>>> (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(fd_unref+0x10e)
>>> [0x7fc9cf52ec9e]))) 0-fuse: xlator does not implement release_cbk
>>> ...
>>>
>>>
>>> grep 1abb953b-aa9d-4fa3-9a72-415204057572 master_gfid_file.txt
>>> 1abb953b-aa9d-4fa3-9a72-415204057572 1050/hyve/364/14158.mp4
>>>
>>> putz at sdn-de-gate-01:~/central$ ./mycommand.sh -H
>>> gluster-ger,gluster-wien -c "getfattr -m . -d -e hex
>>> /gluster-export/1050/hyve/364/14158.mp4"
>>> ...
>>> master volume :
>>> -----------------------------------------------------
>>> Host : gluster-ger-ber-09-int
>>> # file: gluster-export/1050/hyve/364/14158.mp4
>>> trusted.afr.ger-ber-01-client-6=0x000000000000000000000000
>>> trusted.afr.ger-ber-01-client-7=0x000000000000000000000000
>>> trusted.gfid=0x1abb953baa9d4fa39a72415204057572
>>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f
>>>
>>> -----------------------------------------------------
>>> Host : gluster-ger-ber-10-int
>>> # file: gluster-export/1050/hyve/364/14158.mp4
>>> trusted.afr.ger-ber-01-client-6=0x000000000000000000000000
>>> trusted.afr.ger-ber-01-client-7=0x000000000000000000000000
>>> trusted.gfid=0x1abb953baa9d4fa39a72415204057572
>>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f
>>>
>>> ...
>>> slave volume :
>>> Host : gluster-wien-04
>>> # file: gluster-export/1050/hyve/364/14158.mp4
>>> trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
>>> trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
>>> trusted.gfid=0x129ba62c3d214b34beb366fb1e2c8e4b
>>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f
>>>
>>> -----------------------------------------------------
>>> Host : gluster-wien-05
>>> # file: gluster-export/1050/hyve/364/14158.mp4
>>> trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
>>> trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
>>> trusted.gfid=0x129ba62c3d214b34beb366fb1e2c8e4b
>>> trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x54bff5c40008dd7f
>>>
>>> -----------------------------------------------------
>>> ...
>>> putz at sdn-de-gate-01:~/central$
>>>
>>>
>>
>
More information about the Gluster-users
mailing list