[Gluster-devel] spurious disconnects, broken graph?

Emmanuel Dreyfus manu at netbsd.org
Wed Mar 6 14:30:52 UTC 2013


Hi

I still experience spurious disconnects with 3.4.0alpha. 
My test case involves running 4 concurenttar -xzf, and
it never survives more than a few hours.

client log loops on this:
W [dht-diskusage.c:45:dht_du_info_cbk] 0-gfs33-dht: failed to get disk info from gfs33-replicate-1
W [dht-layout.c:179:dht_layout_search] 0-gfs33-dht: no subvolume for hash (value) = 2327848085
W [dht-layout.c:179:dht_layout_search] 0-gfs33-dht: no subvolume for hash (value) = 1965256737
W [dht-layout.c:179:dht_layout_search] 0-gfs33-dht: no subvolume for hash (value) = 4177819066
I [afr-common.c:3882:afr_local_init] 0-gfs33-replicate-1: no subvolumes up

Here is the client volfile:
+------------------------------------------------------------------------------+
  1: volume gfs33-client-0
  2:     type protocol/client
  3:     option transport.socket.ssl-enabled false
  4:     option transport-type tcp
  5:     option remote-subvolume /export/wd3a
  6:     option remote-host silo
  7: end-volume
  8: 
  9: volume gfs33-client-1
 10:     type protocol/client
 11:     option transport.socket.ssl-enabled false
 12:     option transport-type tcp
 13:     option remote-subvolume /export/wd3a
 14:     option remote-host hangar
 15: end-volume
 16: 
 17: volume gfs33-client-2
 18:     type protocol/client
 19:     option transport.socket.ssl-enabled false
 20:     option transport-type tcp
 21:     option remote-subvolume /export/wd1a
 22:     option remote-host hotstuff
 23: end-volume
 24: 
 25: volume gfs33-client-3
 26:     type protocol/client
 27:     option transport.socket.ssl-enabled false
 28:     option transport-type tcp
 29:     option remote-subvolume /export/wd1a
 30:     option remote-host hangar
 31: end-volume
 32: 
 33: volume gfs33-replicate-0
 34:     type cluster/replicate
 35:     subvolumes gfs33-client-0 gfs33-client-1
 36: end-volume
 37: 
 38: volume gfs33-replicate-1
 39:     type cluster/replicate
 40:     subvolumes gfs33-client-2 gfs33-client-3
 41: end-volume
 42: 
 43: volume gfs33-dht
 44:     type cluster/distribute
 45:     subvolumes gfs33-replicate-0 gfs33-replicate-1
 46: end-volume
 47: 
 48: volume gfs33-write-behind
 49:     type performance/write-behind
 50:     subvolumes gfs33-dht
 51: end-volume
 52: 
 53: volume gfs33-read-ahead
 54:     type performance/read-ahead
 55:     subvolumes gfs33-write-behind
 56: end-volume
 57: 
 58: volume gfs33-io-cache
 59:     type performance/io-cache
 60:     subvolumes gfs33-read-ahead
 61: end-volume
 62: 
 63: volume gfs33-quick-read
 64:     type performance/quick-read
 65:     subvolumes gfs33-io-cache
 66: end-volume
 67: 
 68: volume gfs33-md-cache
 69:     type performance/md-cache
 70:     subvolumes gfs33-quick-read
 71: end-volume
 72: 
 73: volume gfs33
 74:     type debug/io-stats
 75:     option count-fop-hits off
 76:     option latency-measurement off
 77:     option log-level INFO
 78:     subvolumes gfs33-md-cache
 79: end-volume

+------------------------------------------------------------------------------+

And here is the strange thing, peers do not have the same peer status 
information, just like if the graph was broken.:

hangar# gluster peer status
Number of Peers: 2

Hostname: silo
Uuid: 639e5925-eadb-485f-9eeb-930103a477b0
State: Peer in Cluster (Connected)

Hostname: hotstuff
Uuid: 42cfb8e9-0e7e-43e9-8092-593b80ba6d52
State: Sent and Received peer request (Connected)


hotstuff# gluster peer status 
Number of Peers: 2

Hostname: 192.0.2.98
Uuid: c0959c60-3546-46f7-b34f-7ad1ee52fb5e
State: Peer is connected and Accepted (Connected)

Hostname: silo
Uuid: 4b2f4d0c-2d00-4c2e-aa9a-5069aa50e9c1
State: Peer in Cluster (Connected)


silo# gluster peer status 
Number of Peers: 1

Hostname: 192.0.2.103
Port: 24007
Uuid: 42cfb8e9-0e7e-43e9-8092-593b80ba6d52
State: Peer in Cluster (Connected)





-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org




More information about the Gluster-devel mailing list