[Gluster-users] Hanging writes after upgrading "clients" to debian squeeze

Sun Feb 5 21:24:08 UTC 2012

Hi David,

below was the client log (name of the mount point like you said). The lenny servers are still working and in production, I tested some of them.

On the client side I tried the following:

- 3.2.0-3.2.5 debs from website
- I had the sources still laying around on the webservers (3.2.0), I recompiled everything

Neither of them worked.

-----Ursprüngliche Nachricht-----
Von: David Coulson [mailto:david at davidcoulson.net] 
Gesendet: Sonntag, 5. Februar 2012 22:16
An: Stefan Becker
Cc: Whit Blauvelt; Brian Candler; gluster-users at gluster.org
Betreff: Re: [Gluster-users] Hanging writes after upgrading "clients" to debian squeeze

Can you post the client logs also? There should be a filename 
corresponding to the mountpoint of the gluster volume on the client.

Since you are running a replicate volume, you could try shutting down 
gluster on each of the servers in turn and seeing if the write block 
only occurs on one of them.

Did fuse get updated as part of your debian update? Perhaps you are 
hitting a fuse bug? Do you have another system running the previous 
version of debian you were using to validate it can coonnect/write to 
the gluster volume properly? Did you recompile your gluster fuse 
libraries following the update, so they are built against the version of 
fuse you are running?

On 2/5/12 3:49 PM, Stefan Becker wrote:
> - no ip tables involved
> - the server is running 3.2.0 as well, of course I could upgrade but this probably means some downtime which I can not afford right now
> - did not find something in the logs, but there are a lot of files so I might miss something, logs on the client or server side?
> - some debug flags or verbose logging possible?
>
> the bricks log on the server side just says "client connected" so there is not a lot of value in that. On the client side I have the following:
>
> [2012-02-05 19:38:53.324172] I [fuse-bridge.c:3214:fuse_thread_proc] 0-fuse: unmounting /home/XXXstorage
> [2012-02-05 19:38:53.324221] I [glusterfsd.c:712:cleanup_and_exit] 0-glusterfsd: shutting down
> [2012-02-05 19:38:58.709783] W [write-behind.c:3023:init] 0-XXXstorage-write-behind: disabling write-behind for first 0 bytes
> [2012-02-05 19:38:58.711289] I [client.c:1935:notify] 0-XXXstorage-client-0: parent translators are ready, attempting connect on transport
> [2012-02-05 19:38:58.711489] I [client.c:1935:notify] 0-XXXstorage-client-1: parent translators are ready, attempting connect on transport
> Given volfile:
> +------------------------------------------------------------------------------+
>    1: volume XXXstorage-client-0
>    2:     type protocol/client
>    3:     option remote-host 10.10.100.40
>    4:     option remote-subvolume /brick1
>    5:     option transport-type tcp
>    6: end-volume
>    7:
>    8: volume XXXstorage-client-1
>    9:     type protocol/client
>   10:     option remote-host 10.10.100.41
>   11:     option remote-subvolume /brick1
>   12:     option transport-type tcp
>   13: end-volume
>   14:
>   15: volume XXXstorage-replicate-0
>   16:     type cluster/replicate
>   17:     subvolumes XXXstorage-client-0 XXXstorage-client-1
>   18: end-volume
>   19:
>   20: volume XXXstorage-write-behind
>   21:     type performance/write-behind
>   22:     subvolumes XXXstorage-replicate-0
>   23: end-volume
>   24:
>   25: volume XXXstorage-read-ahead
>   26:     type performance/read-ahead
>   27:     subvolumes XXXstorage-write-behind
>   28: end-volume
>   29:
>   30: volume XXXstorage-io-cache
>   31:     type performance/io-cache
>   32:     subvolumes XXXstorage-read-ahead
>   33: end-volume
>   34:
>   35: volume XXXstorage-stat-prefetch
>   36:     type performance/stat-prefetch
>   37:     subvolumes XXXstorage-io-cache
>   38: end-volume
>   39:
>   40: volume XXXstorage
>   41:     type debug/io-stats
>   42:     option latency-measurement off
>   43:     option count-fop-hits off
>   44:     subvolumes XXXstorage-stat-prefetch
>   45: end-volume
>
> +------------------------------------------------------------------------------+
> [2012-02-05 19:38:58.712460] I [rpc-clnt.c:1531:rpc_clnt_reconfig] 0-XXXstorage-client-1: changing port to 24015 (from 0)
> [2012-02-05 19:38:58.712527] I [rpc-clnt.c:1531:rpc_clnt_reconfig] 0-XXXstorage-client-0: changing port to 24012 (from 0)
> [2012-02-05 19:39:02.709882] I [client-handshake.c:1080:select_server_supported_programs] 0-XXXstorage-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
> [2012-02-05 19:39:02.710112] I [client-handshake.c:1080:select_server_supported_programs] 0-XXXstorage-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
> [2012-02-05 19:39:02.710355] I [client-handshake.c:913:client_setvolume_cbk] 0-XXXstorage-client-1: Connected to 10.10.100.41:24015, attached to remote volume '/brick1'.
> [2012-02-05 19:39:02.710395] I [afr-common.c:2514:afr_notify] 0-XXXstorage-replicate-0: Subvolume 'XXXstorage-client-1' came back up; going online.
> [2012-02-05 19:39:02.712314] I [fuse-bridge.c:3316:fuse_graph_setup] 0-fuse: switched to graph 0
> [2012-02-05 19:39:02.712387] I [client-handshake.c:913:client_setvolume_cbk] 0-XXXstorage-client-0: Connected to 10.10.100.40:24012, attached to remote volume '/brick1'.
> [2012-02-05 19:39:02.712436] I [fuse-bridge.c:2897:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
> [2012-02-05 19:39:02.713253] I [afr-common.c:836:afr_fresh_lookup_cbk] 0-XXXstorage-replicate-0: added root inode
>
> I cannot see any problems. I was tailing a few logs while I issued a write which hangs. Nothing gets logged.
>
> -----Ursprüngliche Nachricht-----
> Von: Whit Blauvelt [mailto:whit.gluster at transpect.com]
> Gesendet: Sonntag, 5. Februar 2012 21:25
> An: Brian Candler
> Cc: Stefan Becker; gluster-users at gluster.org
> Betreff: Re: [Gluster-users] Hanging writes after upgrading "clients" to debian squeeze
>
> On Sun, Feb 05, 2012 at 07:36:55PM +0000, Brian Candler wrote:
>> On Sun, Feb 05, 2012 at 08:02:08PM +0100, Stefan Becker wrote:
>>>     After the debian upgrade I can
>>>     still mount my volumes. Reading is fine as well but it hangs on writes.
>> Could it be that on the post-upgrade machines one brick is reachable but not
>> the other?  Compare iptables rules between the pre-upgrade and post-upgrade
>> machines?  Compare tcpdump or ntop between them?
> If you can, try dropping iptables out of the picture entirely. If you are
> running it, and have it logging what it drops, the docs say "Ensure that TCP
> ports 111, 24007,24008, 24009-(24009 + number of bricks across all volumes)
> are open on all Gluster servers. If you will be using NFS, open additional
> ports 38465 to 38467." So I'd check your logs to see if iptables is dropping
> any traffic to/from the IPs in question on those ports.
>
> Or us "netstat -tc" while doing some file operations, and you should see the
> traffic on the IPs/ports. Another utility to see the same thing is "iptraf."
>
> Whit	
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users