[Gluster-users] df causes hang

Joe Warren-Meeks joe at encoretickets.co.uk
Sat Jan 15 11:40:56 UTC 2011


Hey guys,

 

I've been using glusterfs to share a volume between two webservers
happily for quite a while.

 

However, for some reason, they've got into a bit of a state such that
typing 'df -k' causes both to hang, resulting in a loss of service for42
seconds. I see the following messages in the log files:

 

Any ideas what might be causing this?

 

Server1

 

Glusterfs.log: (i.e. the client log)

[2011-01-15 11:22:54] E [saved-frames.c:165:saved_frames_unwind]
10.10.130.11-1: forced unwinding frame type(1) op(LOOKUP)

[2011-01-15 11:22:54] E [saved-frames.c:165:saved_frames_unwind]
10.10.130.11-1: forced unwinding frame type(1) op(LOOKUP)

[2011-01-15 11:22:54] E [saved-frames.c:165:saved_frames_unwind]
10.10.130.11-1: forced unwinding frame type(1) op(LOOKUP)

[2011-01-15 11:22:54] E [saved-frames.c:165:saved_frames_unwind]
10.10.130.11-1: forced unwinding frame type(1) op(LOOKUP)

[2011-01-15 11:22:54] E [saved-frames.c:165:saved_frames_unwind]
10.10.130.11-1: forced unwinding frame type(2) op(PING)

[2011-01-15 11:22:54] N [client-protocol.c:6976:notify] 10.10.130.11-1:
disconnected

[2011-01-15 11:22:54] N [client-protocol.c:6228:client_setvolume_cbk]
10.10.130.11-1: Connected to 10.10.130.11:6996, attached to remote
volume 'brick1'.

[2011-01-15 11:22:54] N [client-protocol.c:6228:client_setvolume_cbk]
10.10.130.11-1: Connected to 10.10.130.11:6996, attached to remote
volume 'brick1'.

 

Glusterfsd.log:

[2011-01-15 11:22:54] N [server-protocol.c:6748:notify] server-tcp:
10.10.130.12:1023 disconnected

[2011-01-15 11:22:54] N [server-protocol.c:6748:notify] server-tcp:
10.10.130.11:1022 disconnected

[2011-01-15 11:22:54] N [server-protocol.c:6748:notify] server-tcp:
10.10.130.12:1022 disconnected

[2011-01-15 11:22:54] N [server-helpers.c:842:server_connection_destroy]
server-tcp: destroyed connection of
w3-4176-2010/10/19-06:35:34:26343-10.10.130.11-1

[2011-01-15 11:22:54] N [server-protocol.c:6748:notify] server-tcp:
10.10.130.11:1018 disconnected

[2011-01-15 11:22:54] N [server-helpers.c:842:server_connection_destroy]
server-tcp: destroyed connection of
w2-827-2011/01/15-11:09:38:7996-10.10.130.11-1

[2011-01-15 11:22:54] N [server-protocol.c:5812:mop_setvolume]
server-tcp: accepted client from 10.10.130.12:1019

[2011-01-15 11:22:54] N [server-protocol.c:5812:mop_setvolume]
server-tcp: accepted client from 10.10.130.12:1018

[2011-01-15 11:22:54] N [server-protocol.c:5812:mop_setvolume]
server-tcp: accepted client from 10.10.130.11:1023

[2011-01-15 11:22:54] N [server-protocol.c:5812:mop_setvolume]
server-tcp: accepted client from 10.10.130.11:1019

 

 

Server2

Client log:

[2011-01-15 11:21:47] E
[client-protocol.c:415:client_ping_timer_expired] 10.10.130.11-1: Server
10.10.130.11:6996 has not responded in the last 42 seconds,
disconnecting.

[2011-01-15 11:21:47] E [saved-frames.c:165:saved_frames_unwind]
10.10.130.11-1: forced unwinding frame type(1) op(STATFS)

[2011-01-15 11:21:47] E [saved-frames.c:165:saved_frames_unwind]
10.10.130.11-1: forced unwinding frame type(1) op(LOOKUP)

[2011-01-15 11:21:47] E [saved-frames.c:165:saved_frames_unwind]
10.10.130.11-1: forced unwinding frame type(1) op(LOOKUP)

[2011-01-15 11:21:47] N [client-protocol.c:6976:notify] 10.10.130.11-1:
disconnected

[2011-01-15 11:22:54] N [client-protocol.c:6228:client_setvolume_cbk]
10.10.130.11-1: Connected to 10.10.130.11:6996, attached to remote
volume 'brick1'.

[2011-01-15 11:22:54] N [client-protocol.c:6228:client_setvolume_cbk]
10.10.130.11-1: Connected to 10.10.130.11:6996, attached to remote
volume 'brick1'.

 

Note that the 2nd server doesn't show anything in the server log.

 

My glusterfsd.vol:

volume posix1

  type storage/posix

  option directory /data/export

end-volume

 

volume brick1

    type features/locks

    subvolumes posix1

end-volume

 

volume server-tcp

    type protocol/server

    option transport-type tcp

    option auth.addr.brick1.allow *

    option transport.socket.listen-port 6996

    option transport.socket.nodelay on

    subvolumes brick1

end-volume

 

 

repstore.vol

## file auto generated by /usr/bin/glusterfs-volgen (mount.vol)

# Cmd line:

# $ /usr/bin/glusterfs-volgen --name repstore1 --raid 1
10.10.130.11:/data/export 10.10.130.12:/data/export

 

# RAID 1

# TRANSPORT-TYPE tcp

volume 10.10.130.12-1

    type protocol/client

    option transport-type tcp

    option remote-host 10.10.130.12

    option transport.socket.nodelay on

    option transport.remote-port 6996

    option remote-subvolume brick1

end-volume

 

volume 10.10.130.11-1

    type protocol/client

    option transport-type tcp

    option remote-host 10.10.130.11

    option transport.socket.nodelay on

    option transport.remote-port 6996

    option remote-subvolume brick1

end-volume

 

volume mirror-0

    type cluster/replicate

    subvolumes 10.10.130.11-1 10.10.130.12-1

end-volume

 

volume writebehind

    type performance/write-behind

    option cache-size 4MB

    subvolumes mirror-0

end-volume

 

volume iocache

    type performance/io-cache

    option cache-size `grep 'MemTotal' /proc/meminfo  | awk '{print $2 *
0.2 / 1024}' | cut -f1 -d.`MB

    option cache-timeout 60

    subvolumes writebehind

end-volume

 

  -- joe.

 

Joe Warren-Meeks

Director Of Systems Development

ENCORE TICKETS LTD

Encore House, 50-51 Bedford Row, London WC1R 4LR

Direct line:          +44 (0)20 7492 1506

Reservations:    +44 (0)20 7492 1500

Fax:                        +44 (0)20 7831 4410

Email:                    joe at encoretickets.co.uk
<mailto:joe at encoretickets.co.uk> 

web:                      www.encoretickets.co.uk
<http://www.encoretickets.co.uk/> 

 

 

Copyright in this message and any attachments remains with us. It is
confidential and may be legally privileged. If this message is not
intended for you it must not be read, copied or used by you or disclosed
to anyone else. Please advise the sender immediately if you have
received this message in error. Although this message and any
attachments are believed to be free of any virus or other defect that
might affect any computer system into which it is received and opened it
is the responsibility of the recipient to ensure that it is virus free
and no responsibility is accepted by Encore Tickets Limited for any loss
or damage in any way arising from its use.

 


More information about the Gluster-users mailing list