[Gluster-users] Geo-replication stops every few days
Matt
matt at mattlantis.com
Mon Mar 30 16:10:35 UTC 2015
Hi List,
I do geo-replication on a half a dozen volumes or so across several
locations. It works fine except for one, our largest volume, a 2x2
distributed-replicated volume with about 40TB of mixed media on it. All
of these servers/volumes are running 3.4.6 on CentOS 6.
Every few days, geo-replication on this volume will stop, though the
status command shows OK.
I've checked all of the things like clock drift that the geo-rep
troubleshooting guide recommends, and everything seems to be fine.
Hopefully without spamming too much logs, here's the last few lines of
what I hope might be relevant whenever logging stops.
Any ideas would be much apprediated, I've ben running up against this
intermittently for months.
On the master:
$ cat
ssh%3A%2F%2Froot%40192.168.78.91%3Agluster%3A%2F%2F127.0.0.1%3Amedia-vol.log
[2015-03-19 20:13:15.269819] I [master:669:crawl] _GMaster: completed
60 crawls, 0 turns
[2015-03-19 20:14:16.218323] I [master:669:crawl] _GMaster: completed
60 crawls, 0 turns
[2015-03-19 20:15:17.171961] I [master:669:crawl] _GMaster: completed
60 crawls, 0 turns
[2015-03-19 20:16:18.112601] I [master:669:crawl] _GMaster: completed
60 crawls, 0 turns
[2015-03-19 20:17:19.52232] I [master:669:crawl] _GMaster: completed 60
crawls, 0 turns
[2015-03-19 20:18:19.991274] I [master:669:crawl] _GMaster: completed
60 crawls, 0 turns
[2015-03-19 20:20:06.722600] I [master:669:crawl] _GMaster: completed
23 crawls, 0 turns
[2015-03-19 20:42:40.970180] I [master:669:crawl] _GMaster: completed 1
crawls, 0 turns
[2015-03-19 20:47:14.961935] I [master:669:crawl] _GMaster: completed 1
crawls, 0 turns
[2015-03-19 20:48:22.333839] E [syncdutils:179:log_raise_exception]
<top>: connection to peer is broken
$ cat
ssh%3A%2F%2Froot%40192.168.78.91%3Agluster%3A%2F%2F127.0.0.1%3Amedia-vol.gluster.log
[2015-03-19 20:44:17.172597] W
[client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-media-vol-client-2:
remote operation failed: Stale file handle. Path:
/data/v0.1/images/9c0ea93e-46b2-4a27-b31a-ce26897bd299.jpg
(0aadc99c-c3e1-455d-a5ac-4bcf04541482)
[2015-03-19 20:44:38.314659] W
[client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-media-vol-client-2:
remote operation failed: Stale file handle. Path:
/data/v0.1/images/28ed6b8f-ec29-4066-02be-d7ae9a5a7bb6.jpg
(7e97920c-f165-44c3-9868-8dd13cc2b8d0)
[2015-03-19 20:44:38.314738] W
[client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-media-vol-client-3:
remote operation failed: Stale file handle. Path:
/data/v0.1/images/28ed6b8f-ec29-4066-02be-d7ae9a5a7bb6.jpg
(7e97920c-f165-44c3-9868-8dd13cc2b8d0)
[2015-03-19 20:44:53.029449] W
[client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-media-vol-client-2:
remote operation failed: Stale file handle. Path:
/data/v0.1/images/5d150965-f687-43bb-9gae-8f2d04cb02de.jpg
(40676e42-47e0-4c2b-a7b9-9e4101f2e32d)
[2015-03-19 20:44:53.029557] W
[client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-media-vol-client-3:
remote operation failed: Stale file handle. Path:
/data/v0.1/images/5d150965-f687-43bb-9gae-8f2d04cb02de.jpg
(40676e42-47e0-4c2b-a7b9-9e4101f2e32d)
[2015-03-19 20:45:39.031436] W
[client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-media-vol-client-2:
remote operation failed: Stale file handle. Path:
/data/v0.1/images/38067d80-6a2b-498d-b319-ce6c77354151.jpg
(10e53310-0c88-43a7-aa3f-2b48f0720cc7)
[2015-03-19 20:45:39.031552] W
[client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-media-vol-client-3:
remote operation failed: Stale file handle. Path:
/data/v0.1/images/38067d80-6a2b-498d-b319-ce6c77354151.jpg
(10e53310-0c88-43a7-aa3f-2b48f0720cc7)
On the slave:
$ cat
8ed3c554-8d5d-4304-9cd1-cb9a17c0fd64:gluster%3A%2F%2F127.0.0.1%3Agtmmedia-storage.gluster.log
[2015-03-20 01:42:39.715954] I
[dht-common.c:1000:dht_lookup_everywhere_done] 0-media-vol-dht: STATUS:
hashed_subvol media-vol-replicate-1 cached_subvol null
[2015-03-20 01:42:39.716792] I
[dht-common.c:1000:dht_lookup_everywhere_done] 0-media-vol-dht: STATUS:
hashed_subvol media-vol-replicate-1 cached_subvol null
[2015-03-20 01:42:39.730075] I
[dht-common.c:1000:dht_lookup_everywhere_done] 0-media-vol-dht: STATUS:
hashed_subvol media-vol-replicate-1 cached_subvol null
[2015-03-20 01:42:39.730179] I [dht-rename.c:1159:dht_rename]
0-media-vol-dht: renaming
/data/v0.1/images-add/.D6C7BEBC-3D2B-413F-96EC-5AE7A44B36C4.jpg.Mmu2MZ
(hash=media-vol-replicate-1/cache=media-vol-replicate-1) =>
/data/v0.1/images-add/D6C7BEBC-3D2B-413F-96EC-5AE7A44B36C4.jpg
(hash=media-vol-replicate-1/cache=<nul>)
[2015-03-20 01:51:56.502512] I [fuse-bridge.c:4669:fuse_thread_proc]
0-fuse: unmounting /tmp/gsyncd-aux-mount-6Hn48F
[2015-03-20 01:51:56.596291] W [glusterfsd.c:1002:cleanup_and_exit]
(-->/lib64/libc.so.6(clone+0x6d) [0x3455ae88fd]
(-->/lib64/libpthread.so.0() [0x3455e079d1]
(-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x40533d]))) 0-:
received signum (15), shutting down
[2015-03-20 01:51:56.596309] I [fuse-bridge.c:5301:fini] 0-fuse:
Unmounting '/tmp/gsyncd-aux-mount-6Hn48F'.
$ cat
8ed3c554-8d5d-4304-9cd1-cb9a17c0fd64:gluster%3A%2F%2F127.0.0.1%3Agtmmedia-storage.log
[2015-03-16 13:19:54.624100] I [gsyncd(slave):404:main_i] <top>:
syncing: gluster://localhost:media-vol
[2015-03-16 13:19:55.752953] I [resource(slave):483:service_loop]
GLUSTER: slave listening
[2015-03-16 14:02:40.654535] I [repce(slave):78:service_loop]
RepceServer: terminating on reaching EOF.
[2015-03-16 14:02:40.661967] I [syncdutils(slave):148:finalize] <top>:
exiting.
[2015-03-16 14:02:51.987721] I [gsyncd(slave):404:main_i] <top>:
syncing: gluster://localhost:media-vol
[2015-03-16 14:02:53.141150] I [resource(slave):483:service_loop]
GLUSTER: slave listening
[2015-03-17 17:31:25.696300] I [repce(slave):78:service_loop]
RepceServer: terminating on reaching EOF.
[2015-03-17 17:31:25.703775] I [syncdutils(slave):148:finalize] <top>:
exiting.
[2015-03-17 17:31:37.139935] I [gsyncd(slave):404:main_i] <top>:
syncing: gluster://localhost:media-vol
[2015-03-17 17:31:38.228033] I [resource(slave):483:service_loop]
GLUSTER: slave listening
[2015-03-19 20:51:55.965342] I [resource(slave):489:service_loop]
GLUSTER: connection inactive for 120 seconds, stopping
[2015-03-19 20:51:55.979207] I [syncdutils(slave):148:finalize] <top>:
exiting.
Thanks,
-Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150330/ba1074d3/attachment.html>
More information about the Gluster-users
mailing list